Goto

Collaborating Authors

 ball optimization oracle






Acceleration with a Ball Optimization Oracle Y air Carmon

Neural Information Processing Systems

In the introduction we discuss exact oracles for simplicity, but our results account for inexactness. Our results hold for any weighted Euclidean (semi)norm, i.e.,



Review for NeurIPS paper: Acceleration with a Ball Optimization Oracle

Neural Information Processing Systems

This paper is concerned with optimization via a "ball optimization oracle", which returns the minimizer of a function restricted to an L2 ball of radius r around a query point x. The authors demonstrate an oracle complexity of roughly (R/r) {2/3} when combined with a Monteiro-Svaiter acceleration scheme. The authors show that this oracle can be implemented on a variety of important machine learning problems. The ideas in this paper are elegant and surprising, despite arising from a "deceptively simple" oracle. The reviewers were unanimously positive about this work, and everyone agrees it is an important theoretical contribution to the optimization community.


Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization

Jambulapati, Arun, Sidford, Aaron, Tian, Kevin

arXiv.org Artificial Intelligence

We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms. Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2.


ReSQueing Parallel and Private Stochastic Convex Optimization

Carmon, Yair, Jambulapati, Arun, Jin, Yujia, Lee, Yin Tat, Liu, Daogao, Sidford, Aaron, Tian, Kevin

arXiv.org Machine Learning

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d \epsilon_{\text{dp}}^{-2}$, $(\epsilon_{\text{dp}}, \delta)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 \epsilon_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.